密码学 Kryptografie

Basics

我们首先来看一下Shannon是如何定义加密的：

Schannon cipher 是一个加密方案（encryption scheme）$\Pi := (E,D)$ ，作用于$(\mathcal{K},\mathcal{M},\mathcal{C})$上，其中：

$\mathcal{K}$ 是一个有限的加密密钥集合。
$\mathcal{M}$ 是一个有限的消息集合。
$\mathcal{C}$ 是一个有限的密文集合。
$E: \mathcal{K} \times \mathcal{M} \to \mathcal{C}$ 是加密函数。
$D: \mathcal{K} \times \mathcal{C} \to \mathcal{M}$ 是解密函数。
并且满足Correctness property：$\forall k \in \mathcal{K},\ m \in \mathcal{M} :\ D(k, E(k, m)) = m$。

我们来看一个很简单的例子：One-time-pad (OTP)。

$\mathcal{K} = \mathcal{M} = \mathcal{C} := \{0,1\}^L$ with $L$ as a fixed length $\geq 1$.
$E(k, m) := k \oplus m$
$D(k, c) := k \oplus c$

（这里的$\oplus$表示的是xor运算。）

注意到：

$D(k, E(k, m)) = D(k, k \oplus m) = k \oplus (k \oplus m) = (k \oplus k) \oplus m = 0^L \oplus m = m$

所以OPT是一个Schannon cipher。

Definition 1: (Perfect Ciphertext Indistinguishability / perfect secrecy / perfect security)

一个定义在 $(\mathcal{K}, \mathcal{M}, \mathcal{C})$ 上的 Shannon cipher $\Pi = (E, D)$，满足密文完全不可区分（has perfectly indistinguishable ciphertexts）（或者说是绝对安全 perfectly secure的），当

$\forall m_0, m_1 \in \mathcal{M},\ c \in \mathcal{C} :\ \Pr[E(k, m_0) = c] = \Pr[E(k, m_1) = c].$

其中概率是针对密钥的分布而言，$k$ 是一个随机变量。

我们会假设每个密钥 $k$ 都是从 $\mathcal{K}$ 里uniformly挑选的，即每个密钥被挑选的概率都等于 $\frac{1}{\mathcal{K}}$。

设

$N_{m, c} = \left| \left\{ k \in \mathcal{K} : E(k, m) = c \right\} \right|$

为可以将m加密成c的密钥数量，那么我们可以将定义里的等式重新重新表述为：

$\forall m_0, m_1 \in \mathcal{M},\ c \in \mathcal{C} :\ N_{m_0, c} = N_{m_1, c}$

也就是说对于任意2个明文和1个密文，都存在同样多的密钥可以将这两段密文都加密成这段密文。

Theorem 1:

The one-time pad (OTP) is perfectly secure.

证明：

Given an OTP, for all $m \in \mathcal{M}$ there is exactly one unique $k \in \mathcal{K}$ such that $E(k, m) = c$. It follows that $N_{m,c} = 1$，因为

存在：There is at least one key $k := c \oplus m$ with $E(c \oplus m, m) = c \oplus m \oplus m = c$.
唯一：Assuming two keys $k_0, k_1 \in \mathcal{K}$ exist where $E(k_0, m) = E(k_1, m) = c$:

$\begin{aligned} E(k_0, m) &= E(k_1, m) \\ k_0 \oplus m &= k_1 \oplus m \\ k_0 \oplus m \oplus m &= k_1 \oplus m \oplus m \\ k_0 &= k_1 \end{aligned}$

所以有：

$\forall m_0, m_1 \in \mathcal{M}, c \in \mathcal{C} : N_{m_0, c} = N_{m_1, c}=1$

. $\square$

Theorem 2:

如果$\Pi = (E, D)$是绝对安全的（perfectly secure），那么有 $|\mathcal{K}| \geq |\mathcal{M}|$。

证明：

我们假设 $|\mathcal{K}| < |\mathcal{M}|$，并任意选择一段明文$m_0$和一个密钥$k’$。设$c := E(k’,m_0)$，那么有

$Pr[E(k,m_0)=c] > 0$

设

$S = \{ D(\hat{k},c) \in \mathcal{M}: \hat{k} \in \mathcal{K} \}$

为c所有可能的解密结果。选择$m_1\in \mathcal{M}\backslash S$ （这个集合非空是因为 $|S| \leq |\mathcal{K}| < |\mathcal{M}|$），那么就有

$0 < \Pr[E(k, m_0) = c] \ne \Pr[E(k, m_1) = c] = 0$

a contradiction. $\square$

该如何理解这个定理呢？

其主要说明的就是当密钥空间小于信息空间时，可能会对于有些密文只存在一个（可能的）对应的明文。

然而在实际情况里，一直要求密钥空间大于信息空间其实不太现实，因为我们需要各方各面都传输大量的信息。这就导致Shannon提出来的这个perfect secrecy概念不够实用（practical）。

其根本的原因是，shannon提出来的概念着重于杜绝所有潜在的威胁。但实际上对我们来说，有些威胁是可以接受的：比如说质因数分解很大的数需要大量的计算资源与时间，对我们很难造成什么实质性的威胁。又或者是我运气够好，可以一下子猜到密码。

所以说我们实际追求的是无法以不可忽略的（non-negligible）概率被高效（efficiently）破解的高效加密算法，。

What we would like to have are efficient encryption schemes that cannot be broken “efficiently” with a “non-negligible” probability.

比如说质因数分解，它的理论成功概率很高，但是效率非常低，所以可以被忽视。而猜密码这种方法的成功概率太低了，所以也不需要理会。

重新表述一下就是：“不存在一个算法可以在时间T内以p的概率成功破解我们的加密算法”。

Definition 2 (Polynomially-bounded):

A function $f : \mathbb{Z}_+ \to \mathbb{R}$ is called polynomially-bounded if there exist $c, n_0 \in \mathbb{N}$ such that for all integers $n \geq n_0$, we have

$|f(n)| \leq n^c \quad (f(n) \in \mathcal{O}(n^c)).$

Definition 3 (PPT algorithm):

An algorithm is called efficient, if its runtime as a function of its input length is polynomially-bounded.

We call this algorithm a probabilistic polynomial time (PPT) algorithm.

就其实efficient和PPT是一个意思。

Definition 4 (Efficiently sampleable):

A set $S$ is efficiently sampleable if there is a PPT-algorithm that is able to sample an element from it uniformly and randomly.
For such a sampling, we use the notation $s \xleftarrow{R} S$.

Definition 5 (Negligible):

A function

$f: \mathbb{Z}_+ \to \mathbb{R}$

is called negligible if

$\forall c \in \mathbb{N} \ \exists n_0 \in \mathbb{Z}_+ \ \text{ s.t. for all integers } n \geq n_0 :|f(n)| \leq \frac{1}{n^c} \quad \left(f(n) \in \mathcal{O}\left(\frac{1}{n^c}\right)\right).$

$f$的值会无限小。

例子：$\frac{1}{2^n},\quad \frac{1}{n^{\log n}}$，the probability of randomly choosing a specific key from a set of $2^n$ $n$-bit keys

Definition 6 (Super-poly):

A function $f : \mathbb{Z}_+ \to \mathbb{R}$ is called super-poly if $\frac{1}{f}$ is negligible.

翻译一下：

对于所有的 $c \in \mathbb{N}$ 都存在一个 $n_0 \in \mathbb{Z}_+$使得 for all integers $n \geq n_0$, we have ：

$\begin{align*} |\frac{1}{f(n)}| &\leq \frac{1}{n^c} \\ \Leftrightarrow f(n) &\geq n^c \end{align*}$

例子：The runtime of an algorithm enumerating all $n$-bit keys to possible ciphertexts for a message $m$ by calculating $E(k, m)$ for all $2^n$ keys is super-poly.

Let $c \in \mathbb{R}_+$ be a constant,

$f_{neg}, g_{neg}$

be negligible functions,

$f_{poly}, g_{poly}$

be polynomially-bounded functions, and

$f_{spoly}, g_{spoly}$

be super-polynomial functions.

Then:

通常我们会定义2个参数：

Security parameter $\lambda$: A positive integer.
System parameter $\Lambda$: A bit string.

$\lambda$ is chosen when a cipher is deployed. 一般来说，$\lambda$ 的值越大，密码系统的安全等级越高。较高的值通常与较长的密钥相关联，同时对加解密操作 $D$ 和 $E$ 的性能带来负面影响。

而$\Lambda$ is not chosen but determined by $\lambda$ for a specific cipher. 比如说，$\lambda$可能表示模数 $n$ 的位长度（length in bits），$n$本身则为成为System parameter$\Lambda$。

这两个参数都是公开的。

The parameterization applies the key, message and ciphertext spaces

$\{\mathcal{K}_{\lambda, \Lambda}\}_{\lambda, \Lambda},\quad \{\mathcal{M}_{\lambda, \Lambda}\}_{\lambda, \Lambda},\quad \{\mathcal{C}_{\lambda, \Lambda}\}_{\lambda, \Lambda}$

Definition 7 (efficient algorithm):

Let $A$ be an algorithm that takes as input a security parameter $\lambda \in \mathbb{Z}_{\geq 1}$, as well as other parameters encoded as a bit string $x \in \{0,1\}^{\leq p(\lambda)}$ for some fixed polynomial $p$.

We call $A$ an efficient algorithm if there exist a poly-bounded function $t$ and a negligible function $\epsilon$
such that for all $\lambda$ and $x$, the probability that the running time of $A$ on input $(\lambda, x)$ exceeds $t(\lambda)$ is at most $\epsilon(\lambda)$.

翻译一下：A的输入为一个 security parameter $\lambda \in \mathbb{Z}_{\geq 1}$, 以及被 encoded as a bit string $x \in \{0,1\}^{\leq p(\lambda)}$ （for some fixed polynomial $p$）的其他的parameters。

Definition 8 (computational cipher):

A computational cipher is a pair of efficient algorithms $\Pi := (E,D)$ such that:

$k \in \mathcal{K}$ is an encryption key.
$m \in \mathcal{M}$ is a message.
$c \in \mathcal{C}$ is a ciphertext.
$E : \mathcal{K} \times \mathcal{M} \rightarrow \mathcal{C}$ is the encryption algorithm.
$D : \mathcal{K} \times \mathcal{C} \rightarrow \mathcal{M}$ is the decryption algorithm.
$\mathcal{K}, \mathcal{M}$ and $\mathcal{C}$ are finite spaces.

computational cipher的定义其实和Shannon cipher就只差了一个 efficient algorithms 的条件。

当然在现实生活中，传输数据时会产生各种随机的噪音，所以$E$ 可以是 non-deterministic。在这种情况下我们会引入一个新的参数randomness：$r \in \mathcal{R}$ ，并且用

$c \xleftarrow{R} E(k, m) \quad\text{ or }\quad r \xleftarrow{R} \mathcal{R} \quad\text{ or }\quad c \leftarrow E(k, m; r)$

表示（受随机数的影响的）加密后的结果。

（这里箭头上的$R$就只是表面当前考虑的是受随机数的影响的加密）

而correctness requirement也需要随之改写成：

$\forall c \xleftarrow{R} E(k, m) : D(k, c) = m$

（即就算受随机数的影响，也需要可以成功解密出来。感觉跟coding theory很像。）

当$E$ 是 deterministic 的时候，我们会把$\Pi = (E,D)$ 叫作 deterministic cipher。（比如说OTP）

一个deterministic cipher显然是Shannon cipher。

之前我们定义的 Perfect Ciphertext Indistinguishability 是这样的：

$\forall m_0, m_1 \in \mathcal{M},\ c \in \mathcal{C} :\ \Pr[E(k, m_0) = c] = \Pr[E(k, m_1) = c].$

而我们希望将这个定义推广到可以适应所有的 predicate $\phi$（一个判断函数，输出结果只为true或者false。也就是一个从$\mathcal{C}$ 映射到$\{0,1\}$的函数。比如说判断“密文的前4位是否全为0”。）：

$\forall \phi \in \{0,1\}^{\mathcal{C}},\ m_0, m_1 \in \mathcal{M}:\ \Pr[\phi(E(k, m_0))] = \Pr[\phi(E(k, m_1))]$

给这个式子引入一个误差（a negligible chance for an adversary to break our scheme by demanding）就会变成：

$\forall \phi \in \{0,1\}^{\mathcal{C}},\ m_0, m_1 \in \mathcal{M}:\ |\Pr[\phi(E(k, m_0))] - \Pr[\phi(E(k, m_1))]| \leq \epsilon$

这个便是Statistical Ciphertext Indistinguishability的定义。

而所谓的Computational Ciphertext Indistinguishability便是在此基础上继续要求 predicate $\phi$ 是efficient。

Attack Game

我们通常会用一个Attack Game来形式化定义安全性。会有一个Adversary $\mathcal{A}$，如果他的输出不同于任何”trivial attack”（比如说随机猜测），那么我们就说他破解（break）了我们的加密。我们会用$\mathcal{A}$’s advantage 来衡量他对我们的加密有多大的影响。

Attack Game 1 (Ciphertext Indistinguishability):

For a given cipher $\Pi = (E, D)$ defined over $(\mathcal{K}, \mathcal{M}, \mathcal{C})$ and for a given adversary $\mathcal{A}$, we define two experiments.
For $b \in \{0,1\}$, Experiment $b$ proceeds as follows:

The adversary computes $m_0, m_1 \in \mathcal{M}$ of the same length and sends them to a challenger.
The challenger computes
$k \xleftarrow{R} \mathcal{K},c \xleftarrow{R} E(k, m_b)$
and sends $c$ to the adversary.
The adversary outputs a bit $\hat{b} \in \{0,1\}$.
Let $W_b$ be the event that $\mathcal{A}$ outputs 1.（$\mathcal{A}$在实验b的时候输出1）

图示：

Definition:

We define $\mathcal{A}$’s ciphertext distinguishing advantage or semantic security advantage as

$SCadv[\mathcal{A}, \Pi] := |\Pr[W_0] - \Pr[W_1]|$

Definition 9 (Ciphertext Indistinguishability):

A cipher $\Pi := (E, D)$ has (computationally) indistinguishable ciphertexts,
if $SC_{adv}[\mathcal{A}, \Pi]$ is negligible for all PPT adversaries $\mathcal{A}$.

如果对于所有adversaries $\mathcal{A}$都有：

$SC_{adv}[\mathcal{A}, \Pi] = 0$

那么它一定是(computationally) indistinguishable ciphertexts。

例子：

Trivial attacks
- 如果$\mathcal{A}$永远输出1，那么
  $SCadv[\mathcal{A}, \Pi] := |\Pr[W_0] - \Pr[W_1]| = |1-1| = 0$
- 如果$\mathcal{A}$通过扔硬币选择输出，那么
  $SCadv[\mathcal{A}, \Pi] := |\Pr[W_0] - \Pr[W_1]| = |\frac{1}{2} - \frac{1}{2}| = 0$

如果挑选2条一样的明文，那么advantage也等于0，因为不可能通过任何手段可以判断出来到底加密的是哪条信息，所以只能猜，也就是$|\frac{1}{2} - \frac{1}{2}| = 0$。
假如这样设计一个加密：$E(k, m) := (\text{parity}(m), E’(k, m))$，即加密结果包含2个信息，第一个是明文$m$的奇偶性，第二个是真正的加密内容，那么就会存在一个 adversary with advantage 1：
- $\mathcal{A}$挑选2个明文，一个的parity等于1，一个的等于0.那么他就可以通过$E(k, m)$的第一个component的值（parity）来判断明文到底是那条。
对于perfectly secure ciphers，所有的adversary has advantage 0 。

如果我们让challenger随机挑选b，那么就会得到下面这种attack game：

Attack Game 2 (Ciphertext Indistinguishability — bit guessing) :

For a given cipher $\Pi = (E, D)$ defined over $(\mathcal{K}, \mathcal{M}, \mathcal{C})$ and for a given adversary $\mathcal{A}$, the experiment proceeds as follows:

The adversary computes $m_0, m_1 \in \mathcal{M}$ of the same length and sends them to a challenger.
The challenger computes $b \xleftarrow{R} \{0,1\}$, $k \xleftarrow{R} \mathcal{K}$, $c \xleftarrow{R} E(k, m_b)$
The adversary outputs a bit $\hat{b} \in \{0,1\}$
Let $W$ be the event that $\hat{b} = b$

图示：

Definition:

We define $\mathcal{A}$’s bit-guessing advantage as:

$SCadv^*[\mathcal{A}, \Pi] := |\Pr[W] - \tfrac{1}{2}|$

Lemma:

$SCadv^*[\mathcal{A}] = \frac{1}{2}SCadv[\mathcal{A}]$

证明：

因为b是随机挑选的，所以

$\text{Pr}[b = 0] = \text{Pr}[b = 1] = \frac{1}{2}$

那么有

$\begin{align*} \text{Pr}[W] &= \text{Pr}[\hat{b} = b] \\ &= \text{Pr}[\hat{b} = b \mid b = 0] \cdot \text{Pr}[b = 0] + \text{Pr}[\hat{b} = b \mid b = 1] \cdot \text{Pr}[b = 1] \\ &= \frac{1}{2} \cdot \left( \text{Pr}[\hat{b} = 0 \mid b = 0] + \text{Pr}[\hat{b} = 1 \mid b = 1] \right) \end{align*}$

根据之前$W_0,W_1$的定义可以得到：

$\text{Pr}[\hat{b} = 0 \mid b = 0] = 1 - \text{Pr}[\hat{b} = 1 \mid b = 0] = 1 - \text{Pr}[W_0]$ $\text{Pr}[\hat{b} = 1 \mid b = 1] = \text{Pr}[W_1]$

所以：

$\begin{align*} SCadv^*[\mathcal{A}, \Pi] &= \left| \text{Pr}[W] - \frac{1}{2} \right| \\ &= \left| \frac{1}{2} \cdot (1 - \text{Pr}[W_0] + \text{Pr}[W_1]) - \frac{1}{2} \right| \\ &= \frac{1}{2} \cdot \left| \text{Pr}[W_1] - \text{Pr}[W_0] \right| \\ &= \frac{1}{2} \cdot SCadv[\mathcal{A}, \Pi] \end{align*}$

. $\square$

Theorem 3:

The one-time pad （OTP） has indistinguishable ciphertexts.

证明：

我们用bit-guessing game来证明这个thm。

Let $\mathcal{A}$ be any adversary. Assume it outputs messages $m_0, m_1 \in \{0,1\}^L$ for some length $L$.

Since $k$ is randomly chosen, $m_b \oplus k$ is a uniformly random bitstring of length $L$, independent of $b$. 所以说只能随机猜测b。

所以

$SCadv^*[\mathcal{A}, \Pi_{\text{OTP}}] = | \text{Pr}[W] - \frac{1}{2} | =| \text{Pr}[b = \hat{b}] - \frac{1}{2} |= 0$

.$\square$

Secret key cryptography

Definition 10 (Pseudo-random generator):

A pseudo-random generator (PRG) is an efficiently computable function

$G : \mathcal{S} \to \mathcal{R}$

where:

$|\mathcal{S}| < |\mathcal{R}|$
$\mathcal{S}$ (usually $\{0,1\}^l$) is called the seed space.
$\mathcal{R}$ (usually $\{0,1\}^L$, $L > l$) is called the output space.

我们可以继续哟个attack game来formalize “random”的概念。

Attack Game 3 (PRG advantage):

For a given PRG $G$ defined over $(\mathcal{S}, \mathcal{R})$ and for a given adversary $\mathcal{A}$, we define two experiments.
For $b = 0,1$, Experiment $b$ proceeds as follows:

The challenger computes $r \in \mathcal{R}$ as follows and sends it to $\mathcal{A}$:
- if $b = 0$: $s \xleftarrow{R} \mathcal{S},\ r \leftarrow G(s)$
- if $b = 1$: $r \xleftarrow{R} \mathcal{R}$
The adversary outputs a bit $\hat{b} \in \{0,1\}$
Let $W_b$ be the event that $\mathcal{A}$ outputs 1

图示：

Definition:

we define $\mathcal{A}$’s PRG distinguishing advantage as:

$PRGadv[\mathcal{A}, G] := |\Pr[W_0] - \Pr[W_1]|$

Definition 11 (Secure PRG):

A PRG $G$ is secure, if $PRGadv[\mathcal{A}, G]$ is negligible for all PPT adversaries $\mathcal{A}$.

Definition 12 (Stream Cipher Construction):

Given a PRG $G : \mathcal{K} \to \{0,1\}^L$, we define the basic stream cipher $\Pi = (E, D)$ over
$(\mathcal{K}, \{0,1\}^L, \{0,1\}^L)$ as follows:

$E(k, m) := G(k) \oplus m$
$D(k, c) := G(k) \oplus c$